Optimize two-locus site operations #3291

lkirk · 2025-10-02T06:49:54Z

This PR is a combination of three separate modifications. They are described below and in #3290. Fixes (#3290).

Two-locus malloc optimizations

This revision moves all malloc operations out of the hot loop in two-locus statistics, instead providing pre-allocated regions of memory that the two-locus framework will use to perform work. Instead of simply passing each pre-allocated array into each function call, we introduce a simple structure called two_locus_work_t, which stores the statistical results, and provides temporary arrays for storing the normalisation constants. Setup and teardown methods for this work structure are provided. Python and C tests are passing and valgrind reports no errors.

Refactor bit array api, rename to bitset.

As discussed in #2834, this patch renames tsk_bit_array_t to tsk_bitset_t. Philosophically, we treat these as sets and not arrays, performing intersections, unions, and membership tests. Therefore, it makes sense to alter the API to use set theoretic vocabulary, describing the intent more precisely. Fundamentally, the bitset structure is a list of N independent bitsets. Each operation on two sets must select the row on which to operate. The tsk_bitset_t originally tracked len only, which was N, the number of sets. For convenience, we also track the row_len, which is the number of unsigned integers per row. If we multiply row_len by TSK_BITSET_BITS, we get the number of bits that each set (or row) in the list of bitsets will hold.

We had also discussed each set theoretic operation accepting a row index instead of a pointer to a row within the bitset object. Now, each operation accepts a row index for each bitset structure passed into the function. This simplifies the consumption of this API considerably, removing the need of storing and tracking many intermediate temporary array pointers. We also see some performance improvements from this cleanup. For DRY purposes, I've created a private macro, BITSET_DATA_ROW, which abstracts away the pointer arithmetic for selecting a row out of the list of sets. Because of these changes, tsk_bit_array_get_row is no longer needed and has been removed from the API.

This change does not change the size of the "chunk", which is the unsigned integer storing bits. It remains a 32 bit unsigned integer, which is most performant for bit counting (popcount). I've streamlined the macros used to determine which integer in the row will be used to store a particular bit. Everything now revolves around the TSK_BITSET_BITS macro, which is simply 32 and bitshift operations have been converted to unsigned integer division.

Testing has been refactored to reflect these changes, removing tests that operate on a specific rows. Tests in c and python are passing and valgrind shows no errors.

Fixes (#2834).

Precompute A/B Counts and Biallelic Summary Func

Precompute A/B counts for each sample set. We were previously computing them redundantly each for each site pair in our results matrix. The precomputation happens in a function called get_mutation_sample_sets, which takes our list of sets (tsk_bitset_t) for each mutation and intersects the samples with a particular mutation with the sample sets passed in by the user. The result is an expanded list of sets with one set per mutation per sample set. During this operation, we compute the number of samples containing the given allele for each mutation, avoiding the need to perform redundant count operations on the data.

In addition to precomputation, we add a non-normalized version of compute_general_two_site_stat_result for situations where we're computing stats from biallelic loci. We dispatch the computation of the result based on the number of alleles in the two loci we're comparing. If the number of alleles in both loci is 2, then we simply perform an LD computation on the derived alleles for the two loci. As a result, we remove the need to compute a matrix of LD values, then take a weighted sum. This is much more efficient and means that we only run the full multiallelic LD routine on sites that are multiallelic.

This PR is a combination of three separate modifications. They are described below and in tskit-dev#3290. Fixes (tskit-dev#3290). * Two-locus malloc optimizations This revision moves all malloc operations out of the hot loop in two-locus statistics, instead providing pre-allocated regions of memory that the two-locus framework will use to perform work. Instead of simply passing each pre-allocated array into each function call, we introduce a simple structure called `two_locus_work_t`, which stores the statistical results, and provides temporary arrays for storing the normalisation constants. Setup and teardown methods for this work structure are provided. Python and C tests are passing and valgrind reports no errors. * Refactor bit array api, rename to bitset. As discussed in tskit-dev#2834, this patch renames tsk_bit_array_t to tsk_bitset_t. Philosophically, we treat these as sets and not arrays, performing intersections, unions, and membership tests. Therefore, it makes sense to alter the API to use set theoretic vocabulary, describing the intent more precisely. Fundamentally, the bitset structure is a list of N independent bitsets. Each operation on two sets must select the row on which to operate. The tsk_bitset_t originally tracked `len` only, which was N, the number of sets. For convenience, we also track the `row_len`, which is the number of unsigned integers per row. If we multiply `row_len` by `TSK_BITSET_BITS`, we get the number of bits that each set (or row) in the list of bitsets will hold. We had also discussed each set theoretic operation accepting a row index instead of a pointer to a row within the bitset object. Now, each operation accepts a row index for each bitset structure passed into the function. This simplifies the consumption of this API considerably, removing the need of storing and tracking many intermediate temporary array pointers. We also see some performance improvements from this cleanup. For DRY purposes, I've created a private macro, `BITSET_DATA_ROW`, which abstracts away the pointer arithmetic for selecting a row out of the list of sets. Because of these changes, `tsk_bit_array_get_row` is no longer needed and has been removed from the API. This change does not change the size of the "chunk", which is the unsigned integer storing bits. It remains a 32 bit unsigned integer, which is most performant for bit counting (popcount). I've streamlined the macros used to determine which integer in the row will be used to store a particular bit. Everything now revolves around the TSK_BITSET_BITS macro, which is simply 32 and bitshift operations have been converted to unsigned integer division. Testing has been refactored to reflect these changes, removing tests that operate on a specific rows. Tests in c and python are passing and valgrind shows no errors. Fixes (tskit-dev#2834). * Precompute A/B Counts and Biallelic Summary Func Precompute A/B counts for each sample set. We were previously computing them redundantly each for each site pair in our results matrix. The precomputation happens in a function called `get_mutation_sample_sets`, which takes our list of sets (`tsk_bitset_t`) for each mutation and intersects the samples with a particular mutation with the sample sets passed in by the user. The result is an expanded list of sets with one set per mutation per sample set. During this operation, we compute the number of samples containing the given allele for each mutation, avoiding the need to perform redundant count operations on the data. In addition to precomputation, we add a non-normalized version of `compute_general_two_site_stat_result` for situations where we're computing stats from biallelic loci. We dispatch the computation of the result based on the number of alleles in the two loci we're comparing. If the number of alleles in both loci is 2, then we simply perform an LD computation on the derived alleles for the two loci. As a result, we remove the need to compute a matrix of LD values, then take a weighted sum. This is much more efficient and means that we only run the full multiallelic LD routine on sites that are multiallelic.

codecov · 2025-10-02T06:55:26Z

Codecov Report

❌ Patch coverage is 90.90909% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.82%. Comparing base (6387011) to head (bac7444).

Files with missing lines	Patch %	Lines
c/tskit/trees.c	88.94%	10 Missing and 13 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3291      +/-   ##
==========================================
- Coverage   89.82%   89.82%   -0.01%     
==========================================
  Files          29       29              
  Lines       32665    32725      +60     
  Branches     5982     5996      +14     
==========================================
+ Hits        29342    29394      +52     
- Misses       1882     1886       +4     
- Partials     1441     1445       +4

Flag	Coverage Δ
c-tests	`86.86% <90.90%> (-0.01%)`	⬇️
lwt-tests	`80.38% <ø> (ø)`
python-c-tests	`88.40% <ø> (ø)`
python-tests	`98.76% <ø> (ø)`
python-tests-no-jit	`33.18% <ø> (ø)`
python-tests-numpy1	`50.84% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
c/tskit/core.c	`95.60% <100.00%> (+0.05%)`	⬆️
c/tskit/core.h	`100.00% <ø> (ø)`
c/tskit/trees.c	`90.98% <88.94%> (-0.06%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jeromekelleher

LGTM! Overall looks like a nice cleanup with simpler code. I haven't grokked every detail, but assuming our test coverage is good I think it's fairly safe.

A few minor details pointed out, happy to merge after that.

jeromekelleher · 2025-10-02T08:43:25Z

c/tskit/core.h

-typedef uint32_t tsk_bit_array_value_t;
+// define a 32-bit chunk size for our bitsets.
+// this means we'll be able to hold 32 distinct items in each 32 bit uint
+#define TSK_BITSET_BITS (tsk_size_t) 32


Suggested change

#define TSK_BITSET_BITS (tsk_size_t) 32

#define TSK_BITSET_BITS ((tsk_size_t) 32)

Extra parentheses avoid unexpected casts when using the macro

jeromekelleher · 2025-10-02T08:51:49Z

c/tskit/trees.c

    return ret == TSK_TREE_OK ? 0 : ret;
 }

+static int


Can you give a quick comment here defining what this function does, and what the memory management cycle is? So, it looks like if it succeeds, the caller is responsible for freeing the allele_sample_sets and allele_sample_set_counts?

jeromekelleher · 2025-10-02T08:53:52Z

c/tskit/trees.c

+            ss_off += sample_set_sizes[j];
+        }
+    }
+out:


I think there's memory leak here if allele_sample_set_counts succeeds but tsk_bitset_init fails. I think to be watertight and for simplicity we should free both these in error conditions.

out: if (ret != 0) { tsk_safe_free(*allele_sample_set_counts); tsk_bitset_free(...) }

jeromekelleher · 2025-10-02T08:57:37Z

c/tskit/trees.c

+static int
+get_mutation_sample_sets(const tsk_bitset_t *allele_samples, tsk_size_t num_sample_sets,
+    const tsk_size_t *sample_set_sizes, const tsk_id_t *sample_sets,
+    const tsk_id_t *sample_index_map, tsk_size_t *max_ss_size,


Generally I find it easier to follow what's a return parameter and keep track of what the function is doing if they are explicitly named as such, like ret_max_ss_size, ret_allele_sample_set_counts etc. Then, to avoid fiddly dereferencing, I'd do something like

tsk_size_t max_ss_size = 0; for (i = 0; i < num_sample_sets; i++) { max_ss_size = TSK_MAX(sample_set_sizes[i], max_ss_size); } .... // Just before successful function exit assign the return parameters *ret_max_ss_size = max_ss_size; out:

jeromekelleher · 2025-10-02T08:58:56Z

c/tskit/trees.c

+    for (i = 0; i < n_sites; i++) {
+        site_offsets[i] = n_alleles * num_sample_sets;
+        n_alleles += self->site_mutations_length[sites[i]] + 1;
+        if (self->site_mutations_length[sites[i]] > max_alleles) {


TSK_MAX is a handy shortcut for this

jeromekelleher · 2025-10-02T10:00:57Z

c/tskit/trees.c

+
+    tsk_memset(&tmp, 0, sizeof(tmp));
+    ret = tsk_bitset_init(&tmp, num_samples, 1);
+    if (ret != 0) {


Use goto out for consistency (in case we ever add something above it which needs cleanup)

jeromekelleher · 2025-10-02T10:06:18Z

c/tskit/core.c

-    out->size = self->size;
-    out->data = self->data + (row * self->size);
-}
+#define BITSET_DATA_ROW(bs, row) (bs)->data + (row) * (bs)->row_len


Again, safer to wrap full expression of the macro in parantheses as unexpected things can happen if used with different precedence operators.

lkirk mentioned this pull request Oct 2, 2025

Benchmarking the Two-Locus Framework and LD Calculator #3290

Open

jeromekelleher reviewed Oct 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize two-locus site operations #3291

Optimize two-locus site operations #3291

Uh oh!

lkirk commented Oct 2, 2025

Uh oh!

codecov bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

jeromekelleher left a comment

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

jeromekelleher Oct 2, 2025

Uh oh!

Uh oh!

	#define TSK_BITSET_BITS (tsk_size_t) 32
	#define TSK_BITSET_BITS ((tsk_size_t) 32)

Optimize two-locus site operations #3291

Are you sure you want to change the base?

Optimize two-locus site operations #3291

Uh oh!

Conversation

lkirk commented Oct 2, 2025

Uh oh!

codecov bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Oct 2, 2025 •

edited

Loading